31 research outputs found

    AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

    Get PDF
    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php

    Genome Majority Vote Improves Gene Predictions

    Get PDF
    Recent studies have noted extensive inconsistencies in gene start sites among orthologous genes in related microbial genomes. Here we provide the first documented evidence that imposing gene start consistency improves the accuracy of gene start-site prediction. We applied an algorithm using a genome majority vote (GMV) scheme to increase the consistency of gene starts among orthologs. We used a set of validated Escherichia coli genes as a standard to quantify accuracy. Results showed that the GMV algorithm can correct hundreds of gene prediction errors in sets of five or ten genomes while introducing few errors. Using a conservative calculation, we project that GMV would resolve many inconsistencies and errors in publicly available microbial gene maps. Our simple and logical solution provides a notable advance toward accurate gene maps

    Blueprint for a minimal photoautotrophic cell: conserved and variable genes in Synechococcus elongatus PCC 7942

    Get PDF
    Background: Simpler biological systems should be easier to understand and to engineer towards pre-defined goals. One way to achieve biological simplicity is through genome minimization. Here we looked for genomic islands in the fresh water cyanobacteria Synechococcus elongatus PCC 7942 (genome size 2.7 Mb) that could be used as targets for deletion. We also looked for conserved genes that might be essential for cell survival.Results: By using a combination of methods we identified 170 xenologs, 136 ORFans and 1401 core genes in the genome of S. elongatus PCC 7942. These represent 6.5%, 5.2% and 53.6% of the annotated genes respectively. We considered that genes in genomic islands could be found if they showed a combination of: a) unusual G+C content; b) unusual phylogenetic similarity; and/or c) a small number of the highly iterated palindrome 1 (HIP1) motif plus an unusual codon usage. The origin of the largest genomic island by horizontal gene transfer (HGT) could be corroborated by lack of coverage among metagenomic sequences from a fresh water microbialite. Evidence is also presented that xenologous genes tend to cluster in operons. Interestingly, most genes coding for proteins with a diguanylate cyclase domain are predicted to be xenologs, suggesting a role for horizontal gene transfer in the evolution of Synechococcus sensory systems.Conclusions: Our estimates of genomic islands in PCC 7942 are larger than those predicted by other published methods like SIGI-HMM. Our results set a guide to non-essential genes in S. elongatus PCC 7942 indicating a path towards the engineering of a model photoautotrophic bacterial cell.Financial support was provided by grants BFU2009-12895-C02-01/BMC (Ministerio de Ciencia e Innovación, Spain), the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 212894 and Prometeo/2009/092 (Conselleria d’Educació, Generalitat Valenciana, Spain) to A. Moya. Work in the FdlC laboratory was supported by grants BFU2008-00995/BMC (Spanish Ministry of Education), RD06/0008/1012 (RETICS research network, Instituto de Salud Carlos III, Spanish Ministry of Health) and LSHM-CT- 2005_019023 (European VI Framework Program). Dr. González-Domenech was supported by grant from the University of Granada. LD, thanks to financial support from Facultad de Ciencias, Universidad Nacional Autónoma de México

    Phylogenomic Analysis of Marine Roseobacters

    Get PDF
    Background: Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10–25 % of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution. Methodology/Principal Findings: In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira–Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes. Conclusions/Significance: Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes fo

    A genomic analysis of the archaeal system Ignicoccus hospitalis-Nanoarchaeum equitans

    Get PDF
    Sequencing of the complete genome of Ignicoccus hospitalis gives insight into its association with another species of Archaea, Nanoarchaeum equitans
    corecore